Day 10 - Sorting and reducing

34

$ seq 1 20 | sort -nr | head -n 5

20

19

18

17

16

The final command that I want to show you in this chapter appears often after sort and is called uniq.

Its job is that of removing duplicated lines, leaving only one occurrence. This command, however,

works comparing a line with the following one only, which is the reason why we run it after a sort.

The file examples.txt contains the word cat several times (because I’m not a cat lover, I’m a feline

worshipper. Guess what’s my favourite bash command). You can notice that the pure sort command

lists that word three times in a row. If you run

$ cat examples.txt | sort | uniq

though, you will see it listed only once. Not everybody hate duplicates, though (ask Gaius Baltar),

so uniq has several options that perform different tasks like for example printing only the duplicate

lines. At any rate, in my experience the default behaviour is the most useful one.

Is there anything that people like more than sorting? Oh yeah, and it is pizza! Oh, sorry, I must have

messed up my notes. What was I saying? Oh yes, what do we love more than sorting? Counting,

naturally!

So, uniq can compress a sorted text, removing duplicated lines, but counting them, giving as output

a nice report of the number of times a certain line appeared.

$ cat examples.txt | sort | uniq -c

1

1 007

1 aardvark

1 basilisk

1 beholder

1 Big Bad Wolf

1 bull

1 C-3PO

3 cat

1 corn dog

1 Cyborg 009

1 direwolf

2 dog

1 dryad

1 Dug the Dog